41 research outputs found

    Fast Fourier Transforms on Distributed Memory Parallel Machines

    Get PDF
    One issue which is central in developing a general purpose subroutine on a distributed memory parallel machine is the data distribution. It is possible that users would like to use the subroutine with different data distributions. Thus there is a need to design algorithms on distributed memory parallel machines which can support a variety of data distributions. In this dissertation we have addressed the problem of developing such algorithms to compute the Discrete Fourier Transform (DFT) of real and complex data. The implementations given in this dissertation work for a class of data distributions commonly encountered in scientific applications, known as the block scattered data distributions. The implementations are targeted at distributed memory parallel machines. We have also addressed the problem of rearranging the data after computing the FFT. For computing the DFT of complex data, we use a standard Radix-2 FFT algorithm which has been studied extensively in parallel environment. There are two ways of computing the DFT of real data that are known to be efficient in serial environments: namely (i) the real fast Fourier transform (RFFT) algorithm, and (ii) the fast Hartley transform (FHT) algorithm. However, in distributed memory environments they have excessive communication overhead. We restructure the RFFT and FHT algorithms to reduce this overhead. The restructured RFFT and FHT algorithms are then used in the generalized implementations which work for block scattered data distributions. Experimental results are given for the restructured RFFT and the FHT algorithms on two parallel machines; NCUBE-7 which is a Hypercube MIMD machine and AMT DAP-510 which is a Mesh SIMD machine. The performances of the FFT, RFFT and FHT algorithms with block scattered data distribution were evaluated on Intel iPSC/860, a Hypercube MIMD machine

    Extensible Component Based Architecture for FLASH, A Massively Parallel, Multiphysics Simulation Code

    Full text link
    FLASH is a publicly available high performance application code which has evolved into a modular, extensible software system from a collection of unconnected legacy codes. FLASH has been successful because its capabilities have been driven by the needs of scientific applications, without compromising maintainability, performance, and usability. In its newest incarnation, FLASH3 consists of inter-operable modules that can be combined to generate different applications. The FLASH architecture allows arbitrarily many alternative implementations of its components to co-exist and interchange with each other, resulting in greater flexibility. Further, a simple and elegant mechanism exists for customization of code functionality without the need to modify the core implementation of the source. A built-in unit test framework providing verifiability, combined with a rigorous software maintenance process, allow the code to operate simultaneously in the dual mode of production and development. In this paper we describe the FLASH3 architecture, with emphasis on solutions to the more challenging conflicts arising from solver complexity, portable performance requirements, and legacy codes. We also include results from user surveys conducted in 2005 and 2007, which highlight the success of the code.Comment: 33 pages, 7 figures; revised paper submitted to Parallel Computin

    Fourth Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4)

    Get PDF
    This report records and discusses the Fourth Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4). The report includes a description of the keynote presentation of the workshop, the mission and vision statements that were drafted at the workshop and finalized shortly after it, a set of idea papers, position papers, experience papers, demos, and lightning talks, and a panel discussion. The main part of the report covers the set of working groups that formed during the meeting, and for each, discusses the participants, the objective and goal, and how the objective can be reached, along with contact information for readers who may want to join the group. Finally, we present results from a survey of the workshop attendees

    Star Formation in the First Galaxies I: Collapse Delayed by Lyman-Werner Radiation

    Get PDF
    We investigate the process of metal-free star formation in the first galaxies with a high-resolution cosmological simulation. We consider the cosmologically motivated scenario in which a strong molecule-destroying Lyman-Werner (LW) background inhibits effective cooling in low-mass haloes, delaying star formation until the collapse or more massive haloes. Only when molecular hydrogen (H2) can self-shield from LW radiation, which requires a halo capable of cooling by atomic line emission, will star formation be possible. To follow the formation of multiple gravitationally bound objects, at high gas densities we introduce sink particles which accrete gas directly from the computational grid. We find that in a 1 Mpc^3 (comoving) box, runaway collapse first occurs in a 3x10^7 M_sun dark matter halo at z~12 assuming a background intensity of J21=100. Due to a runaway increase in the H2 abundance and cooling rate, a self-shielding, supersonically turbulent core develops abruptly with ~10^4 M_sun in cold gas available for star formation. We analyze the formation of this self-shielding core, the character of turbulence, and the prospects for star formation. Due to a lack of fragmentation on scales we resolve, we argue that LW-delayed metal-free star formation in atomic cooling haloes is very similar to star formation in primordial minihaloes, although in making this conclusion we ignore internal stellar feedback. Finally, we briefly discuss the detectability of metal-free stellar clusters with the James Webb Space Telescope.Comment: 22 pages, 1 new figure, accepted for publication in MNRA

    Programming Abstractions for Data Locality

    Get PDF
    The goal of the workshop and this report is to identify common themes and standardize concepts for locality-preserving abstractions for exascale programming models. Current software tools are built on the premise that computing is the most expensive component, we are rapidly moving to an era that computing is cheap and massively parallel while data movement dominates energy and performance costs. In order to respond to exascale systems (the next generation of high performance computing systems), the scientific computing community needs to refactor their applications to align with the emerging data-centric paradigm. Our applications must be evolved to express information about data locality. Unfortunately current programming environments offer few ways to do so. They ignore the incurred cost of communication and simply rely on the hardware cache coherency to virtualize data movement. With the increasing importance of task-level parallelism on future systems, task models have to support constructs that express data locality and affinity. At the system level, communication libraries implicitly assume all the processing elements are equidistant to each other. In order to take advantage of emerging technologies, application developers need a set of programming abstractions to describe data locality for the new computing ecosystem. The new programming paradigm should be more data centric and allow to describe how to decompose and how to layout data in the memory.Fortunately, there are many emerging concepts such as constructs for tiling, data layout, array views, task and thread affinity, and topology aware communication libraries for managing data locality. There is an opportunity to identify commonalities in strategy to enable us to combine the best of these concepts to develop a comprehensive approach to expressing and managing data locality on exascale programming systems. These programming model abstractions can expose crucial information about data locality to the compiler and runtime system to enable performance-portable code. The research question is to identify the right level of abstraction, which includes techniques that range from template libraries all the way to completely new languages to achieve this goal
    corecore